PSO and Statistical Clustering for Feature Selection: A New Representation

نویسندگان

  • Bach Hoai Nguyen
  • Bing Xue
  • Ivy Liu
  • Mengjie Zhang
چکیده

Classification tasks often involve a large number of features, where irrelevant or redundant features may reduce the classification performance. Such tasks typically requires a feature selection process to choose a small subset of relevant features for classification. This paper proposes a new representation in particle swarm optimisation (PSO) to utilise statistical clustering information to solve feature selection problems. The proposed algorithm is examined and compared with two conventional feature selection algorithms and two existing PSO based algorithms on eight benchmark datasets of varying difficulty. The experimental results show that the proposed algorithm can be successfully used for feature selection to considerably reduce the number of features and achieve similar or significantly higher classification accuracy than using all features. It achieves significantly better classification accuracy than one conventional method although the number of features is larger. Compared with the other conventional method and the two PSO methods, the proposed algorithm achieves better performance in terms of both the classification performance and the number of features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm

This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a  structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the  measure...

متن کامل

Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines

In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...

متن کامل

A Novel Feature Selection Algorithm using Particle Swarm Optimization for Cancer Microarray Data

Microarray data are often extremely asymmetric in dimensionality, highly redundant and noisy. Most genes are believed to be uninformative with respect to studied classes. This paper proposed a novel feature selection approach for the classification of high dimensional cancer microarray data, which used filtering technique such as signal-tonoise ratio (SNR) score and optimization technique as Pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014